AITopics | non-smooth function

Collaborating Authors

non-smooth function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

fa2246fa0fdf0d3e270c86767b77ba1b-AuthorFeedback.pdf

Neural Information Processing SystemsNov-15-2025, 17:11:54 GMT

We thank the reviewers for a careful reading and the feedback of our submission. We did not pursue theoretical results for PIA because of its lackluster empirical performance. In Line 99, we will change the gradient to subgradient. The definitions of interpolation we use are in [3]. We cap the iterations in the simulations at 1000; we will note this in the final version of the paper.

artificial intelligence, machine learning, speedup, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.32)

Add feedback

Communication Complexity of Distributed Convex Learning and Optimization

Yossi Arjevani, Ohad Shamir

Neural Information Processing SystemsOct-2-2025, 09:39:08 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

fa2246fa0fdf0d3e270c86767b77ba1b-AuthorFeedback.pdf

Neural Information Processing SystemsAug-17-2025, 09:04:08 GMT

artificial intelligence, machine learning, speedup, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.32)

Add feedback

Reviews: A theory on the absence of spurious solutions for nonconvex and nonsmooth optimization

Neural Information Processing SystemsOct-8-2024, 00:52:37 GMT

This paper studies the condition for absence of spurious optimality. In particular, the authors introduce'global functions' to define the set of continuous functions that admit no spurious local optima (in the sense of sets), and develop some corresponding definitions and propositions for an extending characterization of continuous functions that admit no spurious strict local optima. The authors also apply their theory to l1-norm minimization in tensor decomposition. Pros: In my opinion, the main contribution of this paper is to establish a general math result and apply it to study the absence of spurious optimality for a specific problem. I also find some mathematical discoveries on global functions interesting, which include: -- In section 2, the paper provides two examples to show that: (i).

global function, minima, optimality, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.74)

Add feedback

Reviews: Provably Correct Automatic Sub-Differentiation for Qualified Programs

Neural Information Processing SystemsOct-7-2024, 05:28:21 GMT

In this submission, the authors consider the problem of computing sub-differentiation for a class of non-smooth functions automatically and correctly. They give a very nice example that illustrates problems with current automated differentiation frameworks, such as tensorflow and pytorch. Then, the authors prove a chain rule for the one-sided directional derivative of a composite non-smooth function satisfying certain assumptions. Based on this rule, the authors derive a (randomized) algorithm for computing such a derivative for a particular kind of programs only with constant overhead. The algorithm is very similar to the one for back-ward automatic differentiation except that its forward computation is based on the newly-proved chain rule in the submission, rather than the standard chain rule for differentiation.

non-smooth function, provably correct automatic sub-differentiation, submission, (9 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Communication Complexity of Distributed Convex Learning and Optimization

Neural Information Processing SystemsMar-13-2024, 00:45:33 GMT

We study the fundamental limits to communication-efficient distributed methods for convex learning and optimization, under different assumptions on the information available to individual machines, and the types of functions considered. We identify cases where existing algorithms are already worst-case optimal, as well as cases where room for further improvement is still possible. Among other things, our results indicate that without similarity between the local objective functions (due to statistical data similarity or otherwise) many communication rounds may be required, even if the machines have unbounded computational power.

algorithm, communication round, local function, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Pointwise convergence theorem of gradient descent in sparse deep neural network

Yoneda, Tsuyoshi

arXiv.org Artificial IntelligenceJul-11-2023

The theoretical structure of deep neural network (DNN) has been clarified gradually. Imaizumi-Fukumizu (2019) and Suzuki (2019) clarified that the learning ability of DNN is superior to the previous theories when the target function is non-smooth functions. However, as far as the author is aware, none of the numerous works to date attempted to mathematically investigate what kind of DNN architectures really induce pointwise convergence of gradient descent (without any statistical argument), and this attempt seems to be closer to the practical DNNs. In this paper we restrict target functions to non-smooth indicator functions, and construct a deep neural network inducing pointwise convergence provided by gradient descent process in ReLU-DNN. The DNN has a sparse and a special shape, with certain variable transformations.

artificial intelligence, machine learning, target function, (16 more...)

arXiv.org Artificial Intelligence

2304.08172

Country:

Asia > Japan > Honshū > Tōhoku (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre:

Research Report (0.64)
Instructional Material (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.94)

Add feedback

Adaptive Gradient Methods for Constrained Convex Optimization

Ene, Alina, Nguyen, Huy L., Vladu, Adrian

arXiv.org Machine LearningAug-16-2020

Gradient methods are a fundamental building block of modern machine learning. Their scalability and small memory footprint makes them exceptionally well suite d to the massive volumes of data used for present-day learning tasks. While such optimization methods perform very well in practi ce, one of their major limitations consists of their inability to converge faster by taking advantage of specific features of the input data. For example, the training data used for classification tasks may exhibit a few very informative features, while all the others have only marginal relevance. Having access t o this information a priori would enable practitioners to appropriately tune first-order optimizat ion methods, thus allowing them to train much faster. Lacking this knowledge, one may attempt to reach a si milar performance by very carefully tuning hyper-parameters, which are all specific to the learning mod el and input data. This limitation has motivated the development of adaptive m ethods, which in absence of prior knowledge concerning the importance of various features in the da ta, adapt their learning rates based on the information they acquired in previous iterations. The most notable example is AdaGrad [ 13 ], which adaptively modifies the learning rate corresponding to each coordinate in the vector of weights. Following its success, a host of new adaptive methods appeared, inc luding Adam [ 17 ], AmsGrad [ 27 ], and Shampoo [ 14 ], which attained optimal rates for generic online learning tasks.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2007.0884

Genre:

Research Report (0.49)
Instructional Material > Course Syllabus & Notes (0.45)

Industry: Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.34)

Add feedback

Deep Neural Networks Learn Non-Smooth Functions Effectively

Imaizumi, Masaaki, Fukumizu, Kenji

arXiv.org Machine LearningFeb-13-2018

We theoretically discuss why deep neural networks (DNNs) performs better than other models in some cases by investigating statistical properties of DNNs for non-smooth functions. While DNNs have empirically shown higher performance than other standard methods, understanding its mechanism is still a challenging problem. From an aspect of the statistical theory, it is known many standard methods attain optimal convergence rates, and thus it has been difficult to find theoretical advantages of DNNs. This paper fills this gap by considering learning of a certain class of non-smooth functions, which was not covered by the previous theory. We derive convergence rates of estimators by DNNs with a ReLU activation, and show that the estimators by DNNs are almost optimal to estimate the non-smooth functions, while some of the popular models do not attain the optimal rate. In addition, our theoretical result provides guidelines for selecting an appropriate number of layers and edges of DNNs. We provide numerical experiments to support the theoretical results.

artificial intelligence, estimator, machine learning, (16 more...)

arXiv.org Machine Learning

1802.04474

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback